4/6/2021
Computational Bayesian methods
Spatio-temporal statistics
Science-motivated statistical modeling
Extremely noisy relationship between data and process.
Lack of good statistical software.
Spatially explicit reconstructions of climate variables is important.
Prior work – 4 sites and total compute time of approximately 28 hours.
For the \(i\)th observation at location \(\mathbf{s}\) and time \(t\),
\[ \begin{align*} \mathbf{y}_i \left( \mathbf{s}, t \right) & = \left( y_{i1} \left( \mathbf{s}, t \right), \ldots, y_{id} \left( \mathbf{s}, t \right) \right)' \end{align*} \]
is a \(d\)-dimensional compositional count observation.
\[ \begin{align*} \mathbf{y}_i \left( \mathbf{s}, t \right) | \mathbf{p}\left( \mathbf{s}, t \right) & \sim \operatorname{Multinomial} \left( M_i \left( \mathbf{s}, t \right), \mathbf{p}\left( \mathbf{s}, t \right) \right) \end{align*} \]
Compositional count data.
The pollen data are highly variable and overdispersed.
Mixture over a Dirichlet distribution.
\[ \begin{align*} \mathbf{p}\left( \mathbf{s}, t \right) | \boldsymbol{\alpha}\left( \mathbf{s}, t \right) & \sim \operatorname{Dirichlet} \left( \boldsymbol{\alpha}\left( \mathbf{s}, t \right) \right) \end{align*} \]
\[ \begin{align*} \mathbf{y}_i\left( \mathbf{s}, t \right) | \boldsymbol{\alpha}\left( \mathbf{s}, t \right) & \sim \operatorname{Dirichlet-Multinomial} \left( M_i\left( \mathbf{s}, t \right), \boldsymbol{\alpha}\left( \mathbf{s}, t \right) \right) \end{align*} \]
\[ \begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}, t \right) \right) & = \mathbf{z}\left( \mathbf{s}, t \right) \boldsymbol{\beta}. \end{align*} \]
\[ \begin{align*} \operatorname{log} \left( \boldsymbol{\alpha} \left( \mathbf{s}, t \right) \right) & = \mathbf{B} \left( \mathbf{z}\left( \mathbf{s}, t \right) \right) \boldsymbol{\beta} \end{align*} \]
Nolan C, Tipton JR, Booth RK, Hooten MB, Jackson ST (In Press)., “Comparing and improving methods for reconstructing peatland, water-table depth from testate amoebae.” The Holocene.
Hefley TJ, Broms KM, Brost BM, Buderman FE, Kay SL, Scharf HR, Tipton, JR, Williams PJ, Hooten MB (2017). “The basis function approach for, modeling autocorrelation in ecological data.” Ecology, 98(3),, 632-646.
\[ \begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & = \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \boldsymbol{\eta} \left(t \right) \end{align*} \]
\(\mathbf{M}(t) = \rho \mathbf{I}_n\) is a propagator matrix.
\(\mathbf{X} \left(t \right) \boldsymbol{\gamma}\) are the fixed effects from covariates like latitude, elevation, etc.
\(\boldsymbol{\eta} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}\left( \boldsymbol{\theta} \right) \right)\).
\(\mathbf{R} \left( \boldsymbol{\theta} \right)\) is a Mátern spatial covariance matrix with parameters \(\boldsymbol{\theta}\).
\[ \boldsymbol{\eta}^{\star} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right) \right) \]
Banerjee S, Gelfand AE, Finley AO, Sang H (2008). “Gaussian predictive, process models for large spatial data sets.” Journal of the Royal, Statistical Society: Series B (Statistical Methodology), 70(4),, 825-848.
\(\boldsymbol{\eta} \left( t \right) \approx \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \tilde{\boldsymbol{\eta}} \left( t \right)\).
The predictive process can be shown to be the optimal predictor of the parent process \(\boldsymbol{\eta} \left( t \right)\) of dimension \(m\)
The dynamic climate process becomes
\[ \begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & \approx \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \boldsymbol{\tilde{\eta}} \left(t \right) \end{align*} \]
devtools::install_github("jtipton25/BayesComposition")
C++ using Rcpp package for computation speed.Fit the calibration model.
Use posterior distribution from stage (1) to generate predictions of climate independent in space and time.
Smooth the posterior distribution from stage (2) using dynamic linear model.
Use posterior mean estimates which does not fully quantify model uncertainty.
Goal: Use recursive Bayesian ideas from end of talk.
Hooten MB, Johnson DS, Brost BM (2019). “Making Recursive Bayesian, Inference Accessible.” The American Statistician, 1-10.
Murray I, Adams RP, MacKay DJC (2010). “Elliptical slice sampling.” In, AISTATS, volume 13, 541-548.
Tipton JR, Hooten MB, Nolan C, Booth RK, McLachlan J (2019)., “Predicting paleoclimate from compositional data using multivariate, Gaussian process inverse prediction.” The Annals of Applied, Statistics, 13(4), 2363-2388.
\[\begin{align*} \mathbf{y}(\mathbf{s}, t) & \sim \operatorname{Multinomial}(M(\mathbf{s}, t), \pi_{SB}(\boldsymbol{\eta}(\mathbf{s}, t))) \end{align*}\]
\(\pi_{SB}(\boldsymbol{\eta}(\mathbf{s}, t))\) is a stick breaking transformation.
\[\begin{align*} [\mathbf{y} | M, \pi_{SB}(\boldsymbol{\eta})] & =\operatorname{Multinomial}(\mathbf{y} | M, \pi_{SB}(\boldsymbol{\eta})) \\ & = \prod_{j=1}^{J-1} \operatorname{binomial}(y_j | M_j, \tilde{\pi}_j) \\ & = \prod_{j=}^{J-1} {M_j \choose y_j} \frac{(e^{\eta_j})^{y_j}}{(1 + e^{\eta_j})^{M_j} } \end{align*}\]
\[\begin{align*} \frac{(e^{\eta_j})^{y_j}}{(1 + e^{\eta_j})^{M_j} } & = 2^{-M_j} e^{\kappa(y_j) \eta_j} \int_0^\infty e^{- \omega_j \eta_j^2 / 2 } [\omega_j | M_j, 0] \,d \omega \end{align*}\]
\[\begin{align*} [\eta_j, y_j] & = [\eta_j] {M_j \choose y_j} \frac{(e^{\eta_j})^{y_j}}{(1 + e^{\eta_j})^{M_j} }\\ & = \int_0^\infty [\eta_j] {M_j \choose y_j} 2^{-M_j} e^{\kappa(y_j) \eta_j} e^{- \omega_j \eta_j^2 / 2 } [\omega_j | M_j, 0] \,d \omega \end{align*}\]
where the integral defines a joint density over \([\eta_j, y_j, \omega_j]\).
Using this integral representation, we have
\[\begin{align*} \omega_j | \eta_j, y_j & \sim \operatorname{PG( M_j, \eta_j)} \end{align*}\]
which can be sampled using the exponential tilting property of the Pólya-gamma distribution
There is a cost:
\[\begin{align*} \eta_j(\mathbf{s}, t) = \color{blue}{\beta_{0j}} + \color{blue}{\beta_{1j}} \left( \mathbf{x}'(\mathbf{s}, t) \boldsymbol{\gamma} + \mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t) \right) + \color{blue}{\varepsilon(\mathbf{s}, t)} \end{align*}\]
\[\begin{align*} \eta_j(\mathbf{s}, t) = \beta_{0j} + \beta_{1j} \left( \color{red}{\mathbf{x}'(\mathbf{s}, t) \boldsymbol{\gamma}} + \color{purple}{\mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t)} \right) + \varepsilon(\mathbf{s}, t) \end{align*}\]
\(\color{red}{\mathbf{X}(t) = \begin{pmatrix} \mathbf{x}'(\mathbf{s}_1, t) \\ \vdots \\ \mathbf{x}'(\mathbf{s}_n, t) \end{pmatrix}}\) are fixed covariates (elevation, latitude, etc.).
We assume \(\color{red}{\mathbf{X}(t) \equiv \mathbf{X}}\) for all \(t\) although temporally varying covariates are possible (volcanic forcings, Milankovitch cycles, etc.).
\(\color{purple}{\mathbf{W} = \begin{pmatrix} \mathbf{w}'(\mathbf{s}_1) \\ \vdots \\ \mathbf{w}'(\mathbf{s}_n) \end{pmatrix}}\) are spatial basis functions with temporal random effects \(\color{purple}{\boldsymbol{\alpha}(t)}\).
\(\mathbf{Z}_0 \sim \operatorname{N} (\color{red}{\mathbf{X}'(1) \boldsymbol{\gamma}} + \color{purple}{\mathbf{W} \boldsymbol{\alpha}(1)}, \sigma^2_0 \mathbf{I})\) is the observed modern climate state.
\[\begin{align*} \color{purple}{\mathbf{w}'(\mathbf{s}) \boldsymbol{\alpha}(t)} = \color{purple}{\sum_{m=1}^M \mathbf{w}_m'(\mathbf{s}) \boldsymbol{\alpha}_m(t)} \end{align*}\]
\[\begin{align*} \color{purple}{\boldsymbol{\alpha}_m(t)} & \sim \operatorname{N} \left( \mathbf{A}_m \color{purple}{\boldsymbol{\alpha}_m(t-1)}, \tau^2 \mathbf{Q}_m^{-1} \right) \end{align*}\]